Learning Convolutional Text Representations for Visual Question Answering
نویسندگان
چکیده
Visual question answering is a recently proposed articial intelligence task that requires a deep understanding of both images and texts. In deep learning, images are typically modeled through convolutional neural networks, and texts are typically modeled through recurrent neural networks. While the requirement for modeling images is similar to traditional computer vision tasks, such as object recognition and image classication, visual question answering raises a dierent need for textual representation as compared to other natural language processing tasks. In this work, we perform a detailed analysis on natural language questions in visual question answering. Based on the analysis, we propose to rely on convolutional neural networks for learning textual representations. By exploring the various properties of convolutional neural networks specialized for text data, such as width and depth, we present our “CNN Inception + Gate” model. We show that our model improves question representations and thus the overall accuracy of visual question answering models. We also show that the text representation requirement in visual question answering is more complicated and comprehensive than that in conventional natural language processing tasks, making it a beer task to evaluate textual representation methods. Shallow models like fastText, which can obtain comparable results with deep learning models in tasks like text classication, are not suitable in visual question answering.
منابع مشابه
Evaluating Multimodal Representations on Sentence Similarity: vSTS, Visual Semantic Textual Similarity Dataset
The success of word representations (embeddings) learned from text has motivated analogous methods to learn representations of longer sequences of text such as sentences, a fundamental step on any task requiring some level of text understanding [13]. Sentence representation is a challenging task that has to consider aspects such as compositionality, phrase similarity, negation, etc. In order to...
متن کاملOpen-Ended Visual Question-Answering
This thesis studies methods to solve Visual Question-Answering (VQA) tasks with a Deep Learning framework. As a preliminary step, we explore Long Short-Term Memory (LSTM) networks used in Natural Language Processing (NLP) to tackle Question-Answering (text based). We then modify the previous model to accept an image as an input in addition to the question. For this purpose, we explore the VGG-1...
متن کاملLearning Document Image Features With SqueezeNet Convolutional Neural Network
The classification of various document images is considered an important step towards building a modern digital library or office automation system. Convolutional Neural Network (CNN) classifiers trained with backpropagation are considered to be the current state of the art model for this task. However, there are two major drawbacks for these classifiers: the huge computational power demand for...
متن کاملConvolutional Encoding in Bidirectional Attention Flow for Question Answering
Deep learning systems for complex natural language processing tasks like question answering are often large, cumbersome models that require excessive computational power and time. We seek to address this issue by exploring efficient and parallelizable alternatives to the more computationally expensive components of one of the top-performing question-answering architectures. In particular, we ex...
متن کاملABC-CNN: An Attention Based Convolutional Neural Network for Visual Question Answering
We propose a novel attention based deep learning architecture for visual question answering task (VQA). Given an image and an image-related question, VQA returns a natural language answer. Since different questions inquire about the attributes of different image regions, generating correct answers requires the model to have questionguided attention, i.e., the attention on the regions correspond...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1705.06824 شماره
صفحات -
تاریخ انتشار 2017